Models have many uses in ecology

  • Organizing data
  • Developing theoretical principles
  • Exploring hypotheticals
  • Explaining observed patterns
  • Guiding policy
  • Making forecasts

The dialogue between scientist and Nature

  • We have questions about the causes of things…
  • …but Nature does not answer our questions directly.
  • We must propose hypothetical answers to our questions.

The dialogue between scientist and Nature

  • We have questions about the causes of things…
  • …but Nature does not answer our questions directly.
  • We must propose hypothetical answers to our questions.
  • Models are nothing other than hypothetical answers to questions we ask.
  • Models are necessarily partial answers, at best.

The dialogue between scientist and Nature

  • It is relatively easy to ask difficult questions.
  • It is harder, but still relatively easy, to propose creative hypothetical answers.

The dialogue between scientist and Nature

  • It is relatively easy to ask difficult questions.
  • It is harder, but still relatively easy, to propose creative hypothetical answers.
  • It is often much more difficult to hear what the data have to say.

The dialogue between scientist and Nature

  • It is relatively easy to ask difficult questions.
  • It is harder, but still relatively easy, to propose creative hypothetical answers.
  • It is often much more difficult to hear what the data have to say.
  • Before the data will give a response, they must be asked in the proper way.
  • Their responses are always equivocal, but some answers are better than others.
  • One difficulty: how do we quantify this?
  • Another difficulty: the answer we get depends on the question we ask.
    How do we avoid misunderstanding the reply?

Mechanistic models as scientific instruments

In seeking to understand biological phenomena, models are tools for looking at the data.

They focus attention on the discrepancies between our hypotheses and reality.

How to quantify these discrepancies?

Quantifying discrepancy between model and data

Quantifying discrepancy between model and data

Quantifying discrepancy between model and data

  • The natural and optimal quantification is in terms of surprise: \(-\log{p}\)
  • A better answer is one in which the data are less surprising.
  • In this view, a model must be generative before Nature will respond.
  • Thus, properly speaking, a model is a probability distribution.
  • We must model the noise and the error.

Mechanistic models as scientific instruments

  • In seeking to infer causal mechanism in intact biological systems, dynamics are especially informative.
  • Hypotheses about causal mechanisms can often readily be formalized as dynamical systems models.
  • The problems of inference
    • How can we estimate unknown parameters?
    • How can we decide among competing models?
    • How can we extract maximum information from time series data?
  • When do we know when we need a better model vs better data?

Overview

  1. Markov models
  2. Examples: cholera and pertussis
  3. Partially observed Markov processes
  4. Plug and play inference methods
  5. Plug and play methods in practice
  6. Conclusions and recommendations

Endemic cholera

What are the roles of seasonal and decadal climate drivers in the epidemiology of cholera?

What is the best vaccination strategy?

Endemic cholera

Endemic cholera

Endemic cholera

Endemic cholera

Endemic cholera

Endemic cholera

Questions

  1. Roles of seasonal and decadal climate drivers
    • complex seasonality
    • multiennial cycles
  2. Importance of bacteriophage in environment
  3. Relative importance of human-human vs environmental transmission
  4. Durations of vaccine- and infection-induced immunity

Transmission models for endemic cholera

Transmission models for endemic cholera

Transmission models for endemic cholera

Transmission models for endemic cholera

Transmission models for endemic cholera

Transmission models for endemic cholera

Transmission models for endemic cholera

Transmission models for endemic cholera

A large family of models:

  • How well do each of the models explain the data?
  • What insights can we gain into the system’s dynamic self-regulation?

These questions involve inescapable technical complications:

  • nonlinearity
  • stochasticity
  • nonstationarity
  • time-varying parameters

Questions

  1. Roles of seasonal and decadal climate drivers
    • complex seasonality
    • multiennial cycles
  2. Importance of bacteriophage in environment
  3. Relative importance of human-human vs environmental transmission
  4. Durations of vaccine- and infection-induced immunity

The ongoing pertussis resurgence


(Lavine, King, and Bjørnstad 2011)

The ongoing pertussis resurgence


(Domenech de Cellès, Magpantay, King, and Rohani 2016)

Question

Why is pertussis resurgent?

Hypothetical answers include:

  • Changes in vaccine efficacy
  • Vaccine-driven pathogen evolution
  • Increased circulation of congeneric pathogens
  • Asymptomatic transmission
  • Loss of natural immune boosting
  • Increased surveillance sensitivity

Key unknowns

  • durability of vaccine-induced immunity
  • relative efficacy of natural- and vaccine-derived immunity

Modes of vaccine failure

\[\lambda(I_1,I_2,t) = \frac{\beta(t)\,(I_1+\theta\,I_2)+\bar{\beta}\,\iota}{N}\]

Post-vaccination infections are observed at a reduced rate, \(\eta\).

(Magpantay, Domenech de Cellès, Rohani, and King 2016)

Modes of vaccine failure

  • Can we estimate the nature and durability of vaccine-induced protection?
  • Which models explain the data adequately?

(Magpantay et al. 2016)

Pertussis resurgence in progress

Pertussis resurgence in progress

Pertussis resurgence in progress

Pertussis resurgence in progress

Pertussis resurgence in progress

Pertussis resurgence in progress

Pertussis in Massachusetts

(Domenech de Cellès, Magpantay, King, and Rohani 2018)

Modes of vaccine failure

Modes of vaccine failure

(Domenech de Cellès et al. 2018)

Time-varying vaccination rates

(Domenech de Cellès et al. 2018)

Time-varying demographic rates

(Domenech de Cellès et al. 2018)

Age-specific contact rates

(Domenech de Cellès et al. 2018; Rohani, Zhong, and King 2010)

Age-specific contact rates

(Fumanelli, Ajelli, Manfredi, Vespignani, and Merler 2012; Mossong et al. 2008)

Modes of vaccine failure

  • Can we estimate the nature and durability of vaccine-induced protection?
  • Which models explain the data adequately?
  • Questions of parameter estimation and model selection

(Domenech de Cellès et al. 2018)

Sources of error

The key principle is that we must model the error.

  • Measurement error
    • Error
    • Finite precision
  • Process noise
    • Environmental trends and fluctuations
    • Unmodeled heterogeneities
    • Model misspecification
    • Unmodeled variables

Partially observed Markov processes


(King, Nguyen, and Ionides 2016)

Partially observed Markov processes


(King et al. 2016)

Partially observed Markov processes


(King et al. 2016)

Partially observed Markov processes


(King et al. 2016)

Partially observed Markov processes


(King et al. 2016)

Partially observed Markov processes

Central problem

The full joint density is:

\[f_{X,Y}(x,y;\theta) = f_0(x_0;\theta)\,\prod_{n=1}^N\!f_{n}(x_n|x_{n-1};\theta)\,g_{n}(y_n|x_n;\theta).\]

The likelihood function is the marginal density for \(Y\), evaluated at the data:

\[ \begin{split} \mathcal{L}(\theta)&=f_{Y}(y^*_1,\dots,y^*_n;\theta)\\ &=\int f_{X,Y}(x_0,\dots,x_N,y^*_1,\dots,y^*N;\theta)\, dx_1\cdots dx_n. \end{split} \]

Perils of mechanistic models

  • computing the likelihood requires integrating over \(X_t\)
    for all \(t\)
  • traditional solution—be very clever:
    • simplify the model to make it statistically tractable
    • devise and tune an inference algorithm (e.g., MCMC sampler) specifically to the model
  • but: are we testing just our hypothesis or the statistical method also?
  • it is perilous to invest too much time in one model
    • the Pygmalion problem
    • an antidote: the method of multiple working hypotheses

The “plug-and-play” property

Definition: an algorithm has the plug-and-play property if it has no need to compute the latent process transition density.

Plug-and-play methods access the latent process model only via simulation.

This puts essentially no restrictions on the form of the models that can be entertained.

They are also called “simulation-based” methods.

(He, Ionides, and King 2010; King et al. 2016)

Simulation-based inference methods

  • Feature-based methods
    • Approximate Bayesian computation (ABC)
    • Nonlinear forecasting (NLF)
    • Attractor-reconstruction-based methods
    • Probe matching
    • Synthetic likelihood
  • Full-information methods (likelihood based)
    • Sequential Monte Carlo (the particle filter)
    • Particle Markov chain Monte Carlo
    • Iterated filtering

Variants of all of these are available in the pomp software package (King et al. 2016; [https://kingaa.github.io/pomp/]) and elsewhere.

Common model comparison criteria

  • The best model agrees with already published results
  • The best model is the one that I can estimate before my thesis is due
  • The best model is the one that best agrees with my preconceptions
  • The best model is the one that makes my supervisor the happiest
  • An identifiable model is better than a non-identifiable model
  • The best model is the one with the narrowest confidence intervals

Valid model comparison criteria

  • How well does the model fit the data?
    • How does the likelihood compare to benchmark values?
    • Does the model fit well? (fitting best \(\ne\) fitting well )
  • How does the model explain the data?
    • What aspects of the data are explained?
    • What aspects are ascribed to noise?
  • Model adequacy
    • Are the data consistent with the model’s assumptions?
    • If the model were correct, how plausible are the data?

Valid model comparison criteria

  • Predictive power
    • Can the model make accurate out-of-sample predictions?
  • What predictions to parameter estimates make about the results of independent studies?
  • Fertility
    • Does the model make testable predictions?

Pertussis and vaccine failure

\[\lambda(t) = \frac{\beta(t)\,(I_1+\theta\,I_2)+\bar{\beta}\,\iota}{N}\]

Post-vaccination infections are observed at a reduced rate, \(\eta\).

(Magpantay et al. 2016)

Profile likelihood

Interpretation: in vaccinated hosts, infections are mild to asymptomatic, yet equally infectious

Limits to information

Flat profiles indicate lack of information in the data relative to the question.

In effect, the data refuse to answer the question.

(Magpantay et al. 2016)

Endemic cholera


(King, Ionides, Pascual, and Bouma 2008)

Profile likelihood

(King et al. 2008)

Profile likelihood

(King et al. 2008)

Endemic cholera

Is it necessary that all infected individuals be equally infectious?
(King et al. 2008)

Asymptomatic infections

among symptomatic infections, case fatality: \(0.34\pm 0.2\)
duration of immunity \(1.5 \pm 0.7~\text{yr}\)
(King et al. 2008)

Endemic cholera

Perspective

  • Simulation-based methods make it possible to obtain answers to questions posed in the form of models that embody our precise questions.
  • One need not carefully tailor the statistical algorithm to the model.
  • It is perilous to invest too much time in one model: the Pygmalion problem.

Conclusions

  • To obtain answers to our questions, we must pose them properly: as generative, stochastic models.
  • Therefore, we must take care to model the noise.
  • Simulation-based inference methods facilitate scientific investigation.
  • Effective, maximally efficient inference methods are available.
  • There is an intense need for further methodological development of such methods to improve computational efficiency and accommodate new data types.

References

Domenech de Cellès M, Magpantay FMG, King AA, Rohani P (2016). “The Pertussis Enigma: Reconciling Epidemiology, Immunology, and Evolution.” Proc R Soc Lond B, 283(1822), 20152309. https://doi.org/10.1098/rspb.2015.2309.

Domenech de Cellès M, Magpantay FMG, King AA, Rohani P (2018). “The Impact of Past Vaccination Coverage and Immunity on Pertussis Resurgence.” Sci Transl Med, 10(434), eaaj1748. https://doi.org/10.1126/scitranslmed.aaj1748.

Fumanelli L, Ajelli M, Manfredi P, Vespignani A, Merler S (2012). “Inferring the Structure of Social Contacts from Demographic Data in the Analysis of Infectious Diseases Spread.” PLoS Computational Biology, 8(9), e1002673. https://doi.org/10.1371/journal.pcbi.1002673.

He D, Ionides EL, King AA (2010). “Plug-and-Play Inference for Disease Dynamics: Measles in Large and Small Populations as a Case Study.” J R Soc Interface, 7, 271–283. https://doi.org/10.1098/rsif.2009.0151.

King AA, Ionides EL, Pascual M, Bouma MJ (2008). “Inapparent Infections and Cholera Dynamics.” Nature, 454(7206), 877–880. https://doi.org/10.1038/nature07084.

King AA, Nguyen D, Ionides EL (2016). “Statistical Inference for Partially Observed Markov Processes via the R Package Pomp.” J Stat Softw, 69(12), 1–43. https://doi.org/10.18637/jss.v069.i12.

Lavine JS, King AA, Bjørnstad ON (2011). “Natural Immune Boosting in Pertussis Dynamics and the Potential for Long-Term Vaccine Failure.” Proc Natl Acad Sci, 108(17), 7259–7264. https://doi.org/10.1073/pnas.1014394108.

Magpantay FMG, Domenech de Cellès M, Rohani P, King AA (2016). “Pertussis Immunity and Epidemiology: Mode and Duration of Vaccine-Induced Immunity.” Parasitology, 143, 835–849. https://doi.org/10.1017/S0031182015000979.

Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, Massari M, Salmaso S, Tomba GS, Wallinga J, et al. (2008). “Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases.” PLoS Medicine, 5(3), e74. https://doi.org/10.1371/journal.pmed.0050074.

Rohani P, Zhong X, King AA (2010). “Contact Network Structure Explains the Changing Epidemiology of Pertussis.” Science, 330(6006), 982–985. https://doi.org/10.1126/science.1194134.